A Simple Permutation Test for Clusteredness

نویسنده

  • Michael Greenacre
چکیده

Hierarchical clustering is a popular method for finding structure in multivariate data, resulting in a binary tree constructed on the particular objects of the study, usually sampling units. The user faces the decision where to cut the binary tree in order to determine the number of clusters to interpret and there are various ad hoc rules for arriving at a decision. A simple permutation test is presented that diagnoses whether non-random levels of clustering are present in the set of objects and, if so, indicates the specific level at which the tree can be cut. The test is validated against random matrices to verify the type I error probability and a power study is performed on data sets with known clusteredness to study the type II error.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating the Output Cardinality of Partial Preaggregation with a Measure of Clusteredness

We introduce a new parameter, the clusteredness of data, and show how it can be used for estimating the output cardinality of a partial preaggregation operator. This provides the query optimizer with an important piece of information for deciding whether the application of partial preaggregation is beneficial. Experimental results are very promising, due to the high accuracy of the cardinality ...

متن کامل

A Non-Parametric Independence Test Using Permutation Entropy

In the present paper we construct a new, simple and powerful test for independence by using symbolic dynamics and permutation entropy as a measure of serial dependence. We also give the asymptotic distribution of an affine transformation of the permutation entropy under the null hypothesis of independence. An application to several daily financial time series illustrates our approach.

متن کامل

Monte-Carlo Permutation Tests

The distribution of running time could be anything at all. We didnt even say whether it is continuous or discrete. Even so, the null hypothesis does induce a probability distribution on the test statistic without any other information. Specifically, it implies that the samples are indistinguishable and exchangeable. Our test statistic is a mean of 6 values (algorithm B running times) minus the ...

متن کامل

MILP Formulation and Genetic Algorithm for Non-permutation Flow Shop Scheduling Problem with Availability Constraints

In this paper, we consider a flow shop scheduling problem with availability constraints (FSSPAC) for the objective of minimizing the makespan. In such a problem, machines are not continuously available for processing jobs due to preventive maintenance activities. We proposed a mixed-integer linear programming (MILP) model for this problem which can generate non-permutation schedules. Furthermor...

متن کامل

The False Discovery Rate in Simultaneous Fisher and Adjusted Permutation Hypothesis Testing on Microarray Data

Background and Objectives: In recent years, new technologies have led to produce a large amount of data and in the field of biology, microarray technology has also dramatically developed. Meanwhile, the Fisher test is used to compare the control group with two or more experimental groups and also to detect the differentially expressed genes. In this study, the false discovery rate was investiga...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011